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Abstract. Masking, adaptation, and summation paradigms have been used to investigate the 
characteristics of early spatio-temporal vision. Each has been taken to provide evidence for (i) 
oriented and (ii) nonoriented spatial-filtering mechanisms. However, subsequent findings suggest that 
the evidence for nonoriented mechanisms has been misinterpreted: those experiments might have 
revealed the characteristics of suppression (eg, gain control), not excitation, or merely the isotropic 
subunits of the oriented detecting mechanisms. To shed light on this, we used all three paradigms 
to focus on the 'high-speed' corner of spatio-temporal vision (low spatial frequency, high temporal 
frequency), where cross-oriented achromatic effects are greatest. We used flickering Gabor patches 
as targets and a 2IFC procedure for monocular, binocular, and dichoptic stimulus presentations. To 
account for our results, we devised a simple model involving an isotropic monocular filter-stage 
feeding orientation-tuned binocular filters. Both filter stages are adaptable, and their outputs are 
available to the decision stage following nonlinear contrast transduction. However, the monocular 
isotropic filters (i) adapt only to high-speed stimuli — consistent with a magnocellular subcortical 
substrate — and (ii) benefit decision making only for high-speed stimuli (ie, isotropic monocular outputs 
are available only for high-speed stimuli). According to this model, the visual processes revealed by 
masking, adaptation, and summation are related but not identical. 

Keywords: masking, adaptation, subthreshold summation, contrast detection, human vision. 
1 Introduction 

Our visual perceptions of the world around us are derived from the dynamic retinal images 
on the backs of our eyes. These drive complex decision-making processes that control many 
of our interactions with the world. But how are the spatio-temporal retinal images encoded 
by our nervous system? Or put another way, what is the form of the neuronal representation 
that observers use for their decision making? 

The textbook answer is that isotropic spatial filtering in the retina and lateral geniculate 
nucleus (LGN) is followed by orientation-tuned filtering in the primary visual cortex (VI). 
This view is supported by evidence from single-cell physiology, visual psychophysics, and 
various neural imaging techniques (Hubel and Wiesel 1962, 1968; Blakemore and Campbell 
1969; Phillips and Wilson 1984; Bonhoeffer and Grinvald 1991). For example, from single- 
cell recordings we know that the receptive fields of cells in the retina and the LGN are 
approximately circular, having antagonistic centres and surrounds, or superimposed regions 
of antagonism in a push-pull arrangement. In contrast, cortical cells have elongated receptive 
fields that are well suited to detecting elongated image structures such as edges but that do 
not respond to edges, bars, or gratings that are oriented at right angles to their preferred 
orientations. The observer's decision making in behavioural tasks is presumably cortical and 
is thought to tap the outputs of these orientation-tuned cells, in simple contrast detection 
tasks at least. In other words, although the visual analysis begins with orientation-insensitive 



160 



T S Meese, D H Baker 



(isotropic) filtering (using circular receptive fields), the visual code upon which an observer's 
decision making is made is distributed across orientation-tuned filters. 

1 .1 Psychophysical evidence for the textbook model of early vision 

Evidence from at least three psychophysical paradigms has been taken to support the 
textbook model above: (i) threshold elevation produced by contrast masking is orientation 
selective (Phillips and Wilson 1984; Meese and Holmes 2010), (ii) threshold elevation 
produced by contrast adaptation is orientation selective (Blakemore and Campbell 1969; 
Snowden 1992), and (iii) there is little or no subthreshold summation of contrast between 
pairs of superimposed gratings with very different orientations (Kulikowski et al 1973; Phillips 
and Wilson 1984; Georgesonand Shackleton 1994). None of these results would be expected if 
observers were to make their psychophysical decisions based on the outputs of mechanisms 
with nonoriented (eg, isotropic) receptive fields. 

Intriguingly, though, the refinement in filtering (from isotropic to oriented) is counter- 
productive for the contrast detection of patterns containing image structure at more than 
one orientation. For example, consider a plaid made from a pair of superimposed sine-wave 
gratings with equal contrasts and oriented at right angles to each other. The Michelson 
contrast of this plaid is twice that of each of its components. Similarly, the responses of 
linear isotropic filter mechanisms situated at the peaks of the plaid are twice as high as the 
best response to just one of its components. Thus, the benefits to contrast detection that 
are available from isotropic filtering in the retina and LGN are lost to the decision-making 
processes, owing to the oriented cortical filtering — in the textbook model of vision, at least 
(figure la). 




LowSF HighSF LowSF High SF Low SF High SF 



Figure 1. Three schematic models of achromatic spatio-temporal filtering in early vision. The filter 
outputs are available to the observer for decision making. The rose/lavender colour gradient indicates 
the linear speed gradient across spatio-temporal space from fast (top left) to slow (bottom right). In all 
cases, the filtering outside the high-speed corner is thought to be oriented (rosettes) . The filtering in the 
high-speed corner might be (a) oriented, (b) nonoriented (here we suppose isotropic), or (c) a mixture 
of the two. Our results point to the scheme in (c). They also indicate that the oriented mechanisms are 
binocular (red), whereas the nonoriented mechanisms are monocular (blue). The allocation of the 
magnocellular and parvocellular labels is an idealised interpretation of Derrington and Lennie (1984) 
and Merigan and Maunsell (1993). Notes: SF: spatial frequency; TF: temporal frequency. 

1 .2 More masking, adaptation, and summation studies 

The simple textbook picture outlined above (figure la) is a convenient one, but it has been 
complicated by subsequent experiments, which require a little untangling. Contrary to 
the textbook model, Burbeck and Kelly (1981) found that masks oriented at right angles 
to the target were able to raise threshold after all. However, this effect was limited to very 
low spatial frequencies (SF) and high temporal frequencies (TF): the high-speed corner of 
spatio-temporal frequency space (where speed = TF/SF; Burbeck and Kelly 1981; Meese and 



Nonoriented filters are monocular and adaptable 



161 



Holmes 2007). A further study by Ferrera and Wilson (1985) found a similar result. The authors 
of both studies concluded that the detecting mechanisms for their oriented targets were 
nonoriented. (By 'detecting mechanism' we mean the mechanism whose output is used for 
decision making.) This conclusion follows from a 'within-channel' model of masking (Legge 
and Foley 1980; Wilson et al 1983), where masking is caused by a reduction in signal-to-noise 
ratio brought about by excitation of the detecting mechanism by the mask. As the masks and 
targets were at right angles to each other, the detecting mechanisms must be nonoriented, 
quite possibly isotropic. As it is unlikely that detecting mechanisms are subcortical, one 
obvious interpretation is that there is a set of isotropic detecting mechanisms in the cortex 
that can be accessed by the observer when performing a detection task (figure lb). 

Single-cell recordings also provide some support for the conclusion above. In addition to 
the well-known orientation-tuned cortical cells, several studies have reported subpopula- 
tions of layer 4 striate cells with isotropic receptive fields in cat (Hirsch et al 2003), in tree 
shrew (Mooser et al 2004), and in monkey (Hubel and Wiesel 1968; Blasdel and Fitzpatrick 
1984), offering a potential cortical substrate for Burbeck and Kelly's (1981) masking results. 
However, there is at least one alternative interpretation of the cross-orientation masking 
results. Orientation-tuned cortical cells are thought to inhibit each other (Morrone et al 1982; 
Bonds 1989; Heeger 1992), producing cross-orientation suppression, and this could be the 
basis for the cross-orientation masking described above (Foley 1994; Meese and Holmes 
2007, 2010; Cass et al 2009). A similar outcome might be achieved by the isotropic inhibitory 
neurons found by Hirsch et al (2003) in layer 4 of the primary visual cortex (Meese et al 2008; 
Roeber et al 2008). This general type of arrangement — involving suppressive interactions — is 
sometimes referred to as 'cross-channel' masking. There is also good psychophysical support 
for this model from a dual-masking paradigm (Ross et al 1993; Foley 1994; Holmes and 
Meese 2004), from contrast matching (Meese and Hess 2004), from experiments involving 
fine-pattern discriminations (Olzak and Thomas 1999), and from the analysis of the slope 
of the psychometric function (Meese and Holmes 2007; Meese et al 2008). See Meese and 
Holmes (2007, 2010) for details and reviews. 

The conclusions above point back to the simple textbook model of early vision (figure la). 
Unfortunately, though, they do not resolve the issue because very different psychophysical 
results have been taken to suggest the contrary. Kelly and Burbeck (1987) performed 
cross-orientation adaptation experiments — where fast flickering adapters and targets had 
orthogonal orientations — and found substantial threshold elevation. For the detecting 
mechanisms to be desensitized by this type of adapter, the implication was that they 
were unselective for adapter orientation (figure lb). However, more recent evidence that 
subcortical magnocellular (but not parvocellular) cells are prone to adaptation (Solomon et 
al, 2004) offers another interpretation: Perhaps the cross-orientation adaptation effects were 
the result of desensitization of the isotropic LGN cells (the subunits) that feed the oriented 
cortical mechanisms [but see Crowder et al (2006) for an alternative view] . The association of 
the magnocellular pathway with the high-speed corner of spatio-temporal vision (Derrington 
and Lennie 1984) is also consistent with this interpretation. Thus, a second thrust towards 
the arrangement in figure lb turns out to remain consistent with that in figure la. 

In a third strand of enquiry, Kelly and Burbeck (1987) also performed subthreshold 
summation experiments for cross-oriented gratings and found about 3 dB (a factor of y/2) 
of summation between equally detectable components in the high-speed corner of spatio- 
temporal vision. This is more summation than expected for detection by independent 
oriented filter mechanisms (Phillips and Wilson 1984; Georgeson and Shackleton 1994), 
which is typically about 1.5 dB (a factor of ~ 1.2) according to several models of probability 
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summation (Tyler and Chen 2000). This result takes us back again to the scheme in figure lb, 
involving nonoriented detection mechanisms. 

Kelly and Burbeck (1987) considered the results from their masking, adaptation, and 
summation experiments to provide good evidence for isotropic detecting mechanisms in the 
high-speed corner of spatio-temporal vision. However, the summation results were from just 
a single observer, and as we have seen, their interpretations of the masking and adaptation 
results are of questionable standing. Furthermore, there are some potential methodological 
problems with some of the experiments. For example, the main masking and adaptation 
experiments involved criterion-sensitive methods of assessment (method of adjustment 
and yes/no staircases). They also used a drifting adaptation stimulus, which added the 
potentially complicating factor of motion after-effects. Another potentially important factor 
is that Kelly and Burbeck (1987) performed their adaptation and subthreshold summation 
studies monocularly. No explanation was offered for this, and although cross-orientation 
masking effects have been found for binocular (Meese and Holmes 2007), monocular, and 
dichoptic mask and target arrangements (Baker et al 2007; Cass et al 2009; Meese and Baker 
2009), the question of whether monocular stimulation was an important factor for obtaining 
cross-orientation adaptation and summation effects (Kelly and Burbeck 1987) remains open. 
This is potentially important because it could have implications for the ocularity of the filter 
mechanisms that are available for decision making. 

1 .3 Aims and outcomes 

Our main aims here were three-fold. First, we wanted to make meaningful comparisons 
across the three contrast detection paradigms above (masking, adaptation, and subthreshold 
summation) by including common spatio-temporal conditions and observers within a 
single study. We used a jittering adaptation stimulus to avoid motion after-effects (see 
methods for details) and two -alterative forced-choice (2AFC) methodology to avoid the 
problems associated with criterion-sensitive methods. We also measured the slopes of the 
psychometric functions in the masking experiment to provide additional constraints on 
data interpretation. Second, we performed the experiments using monocular, binocular, and 
dichoptic presentations of stimulus components to assess the ocularity of the mechanisms 
involved. This was to extend our ongoing investigation of binocular interactions in early 
spatio-temporal vision (eg, Meese et al 2006; Baker et al 2007; Meese and Summers 2009). 
Third, we wanted to develop a model of early spatio-temporal vision, consistent with the 
results of Kelly and Burbeck (Burbeck and Kelly 1981; Kelly and Burbeck 1987) but also more 
recent findings, including those of the experiments performed here. 

1.3.1 The value of masking, adaptation, and summation paradigms. Of the three psychophys- 
ical paradigms used here we consider subthreshold summation to be the most valuable, 
since it does not require detailed models or assumptions about suprathreshold interactions. 
Models of the interactions involved in masking are fairly well developed, and we do not 
aim to extend them here. The main purpose of our masking experiment was to provide 
the additional constraint [over that in the Burbeck and Kelly (1981) study] that comes from 
measuring the slope of the psychometric function. Owing to the potential complexity of 
the processes involved in adaptation — the arrangement and interactions between filter 
mechanisms and the form and loci of desensitization (Tolhurst et al 1973; Foley and Chen 
1997; Langley and Bex 2007; Crowder et al 2006) — we view this as a fairly blunt instrument 
and urge caution when interpreting results. Nevertheless, the simple model that we propose 
does a good job in accounting for most of our results. 

1.3.2 Possible filtering schemes for achromatic spatio-temporal vision. The general hypothe- 
ses that we wished to test are summarised in figure 1. In all cases, contrast detection is by 
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oriented mechanisms for most of spatio-temporal space; the focus of interest is the high- 
speed corner (top left in each panel). (1) In figure la this is purely oriented (the 'textbook' 
model), in figure lb it is purely nonoriented (Kelly and Burbeck 1987), and in figure lc both 
types of mechanism exist. In each case the different mechanisms could be monocular and/ or 
binocular. Our results favour the model in figure lc, with the new insight that nonoriented 
high-speed achromatic mechanisms are strictly monocular (blue) whereas the others are 
binocular (red). We elaborate on the details of this scheme in the general discussion. 

2 Methods 

2.1 Overview of the experiments 

We ran five experiments in total. We refer to monocular masking, adaptation, and summation 
experiments as experiments 1, 2a, and 3a, respectively. We refer to the adaptation and sum- 
mation experiments in which we manipulated ocularity (monocular, binocular, dichoptic) 
as experiments 2b and 3b, respectively We did not manipulate ocularity for the masking 
experiment here, but we have done so elsewhere (eg, Meese and Baker 2009). 

2.2 Apparatus and stimuli 

Stimuli were presented using either a ViSaGe (experiments 1, 2a, and 3a) or a VSG2/5 
(experiments 2b and 3b) stimulus generator (both from Cambridge Research Systems Ltd, 
Kent, UK), controlled by a PC. Each setup used a Clinton Monoray monitor (CRS) running 
at 120 Hz, with a maximum luminance output of 250cd/m 2 . The monitors were gamma 
corrected using standard techniques. In all experiments observers viewed the display through 
ferro-electric shutter goggles (CRS, model FE-1) to enable independent images to be shown 
to each eye using a frame interleaving technique. The goggles act as a neutral density filter of 
0.9 log units, which reduced the effective mean luminance of the display to 16cd/m 2 . 

Target stimuli were circular sine-phase Gabor patches, oriented at ±45 deg. from vertical, 
depending on the experimental conditions. The Gaussian spatial envelope had a full-width- 
at-half-height of 1.67 carrier cycles and the patch size scaled with spatial frequency. Our 
main spatial frequency was 0.5c/deg. (all experiments), but some conditions were also run at 
2c/deg. (experiments 2a and 3a) and 0.25 c/deg. (reported in the general discussion). Stimuli 
were temporally modulated using either a 15 Hz biphasic pulse for 0.25 c/deg. ('fast') and 
0.5 c/deg. stimuli (see figure 2c) or a single cycle of a 2 Hz sinusoidal modulation for 2c/deg. 
stimuli ('slow') (figure 2f). This produced stimulus durations of 67ms and 500ms, respectively. 

In the adaptation paradigm (experiments 2a and 2b), the adapter was a large circular 
patch of sinusoidal grating, oriented at -45 deg. from vertical and with contrast of 80%. The 
total diameter of the adapter was 18 deg., with raised cosine blur the width of half a carrier 
cycle around the boundary (see figure 2d-e). If, instead, we had used an adapter matched to 
the size of the target, then small eye-movements during adaptation could result in the target 
region being underadapted. This is less likely to be a problem with our large adapting field. 

The adapter was phase-jittered to prevent local luminance adaptation. This was done by 
incrementing the phase angle by 90 deg. + x, where x is a random value in the range 0: 180 deg. 
The jitter was constructed to be consistent with the temporal properties of the appropriate 
target stimulus: for 15 Hz conditions a phase shift occurred every 33 ms, and for the 2 Hz 
conditions it occurred every 250 ms. A control experiment confirmed that this produced 
similar results to the more conventional counter-phase flickering adapter. 



1 'By 'speed' we mean the ratio of temporal frequency to spatial frequency (TF/SF). All of the stimuli 
in this study were flickering gratings. They did not drift. 
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Stimulus contrast is reported as Michelson contrast in percent (100(L max - L m j n ) / (L max + 

Lmin) ) • 
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Figure 2. Spatial and temporal properties of the jittering or flickering (adapter) stimuli used in this 
study. None of our stimuli drifted, (a, b) Left- and right-oblique Gabor patches (+45 deg.), used as 
targets in all experiments. In experiment 1 they were also used as masks, (d, e) Adapter stimuli at 
the two main spatial frequencies (0.5c/deg. and 2c/deg.). Temporal waveforms were a ('fast') 15 Hz 
biphasic pulse (c) and a ('slow') single cycle of 2 Hz sinusoidal modulation (f). 

2.3 Procedures 

In all experiments we measured contrast-detection thresholds for oriented Gabor patches, 
sometimes in the presence of a mask. Observers were seated in a darkened room with their 
head in a chin support to which the shutter goggles were attached. The viewing distance was 
57cm. All experiments used a temporal two-interval forced-choice (2IFC) paradigm, with an 
interstimulus interval of either 400ms (DHB) or 500ms (all other observers) . The stimulus 
location was indicated by a quad of fixation points placed on a virtual circle with a radius of 
1.41 carrier cycles of the target stimulus (Summers and Meese 2009). 

Psychometric functions were fitted using a 3-parameter Weibull function to estimate 
threshold (81.6% correct after correcting for lapsing), slope, and lapse rate (Wichmann 
and Hill 2001a, 2001b). Data from each experimental block were fitted independently, 
and the thresholds and slopes were averaged across multiple blocks, and then across 
observers. A bootstrapping technique was used to resample the original fits to each individual 
psychometric function to produce threshold and slope distributions for statistical analyses. 
Significance testing was always two-tailed. 

Specific procedures for each experiment were as described below. 

2.4 Experiment 1 : monocular masking 

There were five conditions: (i) no mask, (ii, iii) co-oriented masks at 2% and 30% contrast, and 
(iv, v) cross-oriented masks at 2% and 30% contrast. The target was always a 0.5c/deg. right 
oblique Gabor patch. All stimuli were presented monocularly to the right eye and the left 
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eye viewed mean luminance. Preliminary staircases were run to determine the approximate 
sensitivity in each condition, in order to set target contrast levels for the main experiment. To 
achieve detailed measures of the psychometric functions, we used the method of constant 
stimuli, with six target contrast levels per condition. Each block consisted of 40 trials per 
contrast level presented in a random order for a given condition. Each observer completed 
four blocks, giving 960 trials per psychometric function. 

2.5 Experiment 2a: monocular adaptation 

The adaptation regimen consisted of two minutes of continuous adaptation at the start 
of a block, and 5 seconds of top-up adaptation before each trial. Between the top-up 
adaptation and the first 2IFC interval there was a gap of 400 ms. The adapter was always 
left oblique (-45 deg.), and the target was either left oblique (co-oriented) or right oblique 
(cross-oriented), with conditions blocked by target orientation. All stimuli were presented 
monocularly to the right eye, and the left eye viewed mean luminance. Baseline thresholds 
(involving no adaptation phase) were measured for both orientations in the right (target) eye 
in blocks at the beginning and end of the entire experimental series and on different days 
from the adaptation. Adaptation was performed for the 0.5c/deg. and the 2c/deg. stimuli on 
different days. Target contrast was determined by a pair of 3-down-l-up staircases, with a 
step size of 3 dB (where contrast in dB is twenty times the log 10 of the Michelson contrast). 
Each adaptation block lasted around 15 minutes, and there were four blocks of trials for each 
condition. 

2.6 Experiment 2b: multiple ocular adaptation conditions 

The methods were similar to those for experiment 2a, except that only one spatial frequency 
(0.5c/deg.) was used, and there were three ocular arrangements of target and adapter. In 
the monocular condition the target and adapter were both presented to the right eye. This 
condition was identical to its equivalent from experiment 2a. In the dichoptic condition 
the adapter was presented to the left eye, and the target was presented to the right eye. 
In the binocular condition both adapter and target were presented to both eyes. Only one 
ocular condition was performed on any given day. Baseline thresholds were measured for 
the relevant binocular and monocular targets in the same way as above. 

2.7 Experiment 3a: monocular cross-orientation summation 

We measured subthreshold summation for orthogonally (cross-)oriented Gabor patches 
in the right eye using the method of constant stimuli. The range of target contrasts was 
determined by the results from experiments 1 and 2a. Thresholds were measured for the left- 
and right-oblique component stimuli (ie, Gabor orientations of -45 deg. and 45 deg.), and 
their sum (the compound), for both spatial frequencies. Preliminary analysis showed that 
the sensitivity for the two orientations was very similar and so results were averaged across 
component orientations for comparison with the compound stimulus when calculating the 
summation ratio (see Results). 

2.8 Experiment 3b: multiple ocular summation conditions 

This experiment was similar to experiment 3a, except that there were three ocular conditions, 
each performed with the 0.5c/deg. stimulus. The monocular condition was as described 
for experiment 3a, and the binocular condition was the same except that the stimuli were 
presented to both eyes. In the dichoptic condition one target component (left oblique) was 
presented to the right eye, and the other target component (right oblique) was presented to 
the left eye. Thus, in the compound condition different eyes saw different stimuli. 

2.9 Observers and preliminary analysis 

Three undergraduate optometry students participated in experiments 1, 2a, and 3a as part of 
their course requirements (AJ, KB, SS). Experiments 2b and 3b were completed by one author 
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(DHB) and four undergraduate students (JC, MD, RP, SB), one of whom was paid (SB), the 
others participating for course credit. All observers wore their required optical correction 
during testing and had normal stereopsis (ie, they were able to see depth in stereograms 
containing crossed and uncrossed disparities). 

Preliminary analysis of the results led us to exclude the data from two of the undergraduate 
observers in experiments 2b and 3b. For one observer (MD), the psychometric functions 
were remarkably unreliable as revealed by their unusually high standard errors (SE > 3 dB 
for more than half of the psychometric functions). For the other (JC) there was a marked 
instability in the baseline measures: contrast detection thresholds were 3 dB higher in the 
block of baselines measured at the end of the experiment compared with the first. This made 
it impractical to evaluate the after-effects of adaptation for this observer. Observers DHB and 
SB were subsequently recruited to replace the discarded datasets. 

3 Results and discussion 

3.1 Experiment 1 : monocular masking 

The stimulus conditions for the monocular masking experiment are summarised in figure 3a. 
Example psychometric functions are shown for one observer (AJ) in figure 3b for each of the 
five mask conditions (see legend of figure 3b). Note that relative to baseline (black), these are 
shifted rightwards for the 30% contrast masks (dark blue) , regardless of whether the mask was 
co-oriented or cross-oriented relative to the target, but that the slope of the psychometric 
function becomes very shallow only in the co-oriented condition (solid dark blue). 

Threshold elevation (TE) was calculated in the conventional way as follows: TE = 
20 log 10 (C M ask I C N o_mask), where C M ask and C no _mask are the contrast detection thresh- 
olds with and without the mask, respectively. Figure 3c shows threshold elevation averaged 
across the three observers (AJ, KB, and SS). (In this, and other bar charts, results for individual 
observers are shown by the coloured dots.) Average slopes of the psychometric functions are 
shown in figure 3d. 

For threshold elevation statistical significance was evaluated by assessing the difference 
between the averaged bootstrapped populations for baseline and masked thresholds. For 
psychometric slopes we compared the bootstrapped populations for each condition to a 
value of /3 = 1.3, which is the characteristic of a linear psychometric function 12 ' (Foley and 
Legge 1981; Pelli 1985; Tyler and Chen 2000). Statistical significance is indicated in figure 3 
(and other figures) by one asterisk for p < 0.05 and two asterisks for p < 0.01. 

The co-oriented mask produced threshold elevation of around 12 dB (a factor of 4) at 
high contrasts (30%), but produced a threshold reduction at low contrasts (2%) of around 
7 dB (a factor of> 2). These statistically significant effects are consistent with the well-known 
dipper function for pedestal masking (Legge and Foley 1980). The co-oriented mask reduced 
the slope of the psychometric function (Weibull /3) substantially, from /3 > 5 with no mask 
to /3 = 1.8 for the low contrast mask, and /3 = 1.1 for the high contrast mask. The latter value 
is not significantly different from the prediction of ft = 1.3, as anticipated by the linearizing 
effect of pedestal contrast for small signal increments (Foley and Legge 1981; Bird et al 2002; 
Meese etal2006). 

The high-contrast cross-oriented mask produced significant threshold elevation of 
around 8 dB (ie, it was about 4 dB less potent than the co-oriented mask), whereas the 

(2) By this we mean the sigmoidal psychometric function produced by a linear system (ie, the 
psychometric function produced by a linear contrast transducer). When expressed as d' (ie, d' as 
a function of target contrast), the psychometric function for a linear system is linear and has a slope of 
unity. 
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low-contrast cross-oriented mask did not affect threshold significantly. Although slopes for 
these conditions were shallower than for the baseline, they were both significantly steeper 
than /3 = 1.3, with values around /3 = 3 . 

3.1.1 Interpretation of masking results. For a conventional within- channel model of mask- 
ing — where masking derives from direct excitation of the target mechanism by the mask 
(Wilson et al 1983; see also Meese and Holmes 2010) — the effects of cross-orientation 
masking might be taken to indicate nonoriented detecting mechanisms (Ferrera and 
Wilson 1985). However, in contradistinction from co-oriented masking: (i) the slope of the 
psychometric function was not linearized (it was not reduced to /3 = 1.3) (Meese and Holmes 
2007), and (ii) there was no facilitation (Foley 1994). This shows that different processes were 
involved for the two mask orientations. These results are consistent with previous studies of 
binocular cross-orientation masking (Ross and Speed 1991; Foley 1994; Meese 2004; Meese 
and Holmes 2007) and stand against a within-channel account, suggesting that our high- 
speed targets (low spatial frequency, high temporal frequency) were detected by oriented 
detection mechanisms subject to suppressive interactions from the mask. We consider the 
nature of these interactions in the general discussion. 
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Figure 3. Masking experiment (experiment 1). (a) Stimulus configurations (mask plus lower contrast 
target) for co-oriented and cross -oriented conditions). The stimuli had a spatial frequency of 0.5 
c/deg. and a temporal frequency of 15 Hz ('fast'), (b) Example psychometric functions for observer 
AJ. (c) Threshold elevation for contrast detection, (d) Slopes of the psychometric function (/3). Note 
the logarithmic ordinate. In this figure and others bars show results (+ 1 SE) averaged across three 
observers (here: AJ, KB, and SS), and the coloured circles are for individual observers. Asterisks indicate 
statistical significance, as described in the text. The dotted horizontal line indicates /3 = 1.3, which is 
the psychometric slope expected for a linear system. 
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3.2 Experiment 2a: monocular adaptation 

The stimulus conditions for the adaptation experiments are summarised in figure 4a. 
Threshold elevation (TE) from adaptation was calculated in the conventional way as follows: 
TE = 20log 10 CadaptICbaseline), where Cadapt and Cbaseline are the contrast detection 
thresholds with and without adaptation, respectively. This is shown for the two adaptation 
conditions and two spatial frequencies in figure 4b, averaged across three observers (AJ, KB, 
and SS). The co-oriented adapter raised thresholds significantly — by around 6 dB (a factor of 
2) — at both spatial frequencies. When the adapter was cross-oriented, threshold elevation 
was weaker (2.2 dB), but significant in the fast condition (0.5 c/deg., 15 Hz flicker). There 
was no significant threshold elevation in the slow condition (2 c/deg., 2 Hz flicker) for the 
cross- oriented adapter. 

3.2.1 Interpretation of monocular adaptation results. The standard view of adaptation is that 
the adapter desensitizes the visual mechanisms that respond to the adapter, with the caveat 
that some visual mechanisms do not adapt (eg, parvocellular cells in the LGN; Movshon and 
Lennie 1979; Solomon et al 2004). Therefore, these results provide evidence for a nonoriented 
component to the detection process in the high-speed corner of spatio-temporal vision. We 
will elaborate on what that might be in the general discussion. 
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Figure 4. Adaptation experiments. Stimulus configurations (a) are shown for the two orientation 
conditions in experiments 2a and 2b (b and c, respectively) and for each of the three ocular conditions 
in experiment 2b (c). The 'fast' stimuli had a spatial frequency of 0.5 c/deg. and temporal frequency 
of 15 Hz. The 'slow' stimuli had a spatial frequency of 2 c/deg. and a temporal frequency of 2 Hz 
(see figure 2 for details). In (b) and (c) asterisks denote the conditions where threshold elevation for 
contrast detection was significantly different from 0 dB. 
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3.3 Experiment 2b: Multiple ocular adaptation conditions 

We repeated the fast stimulus condition from experiment 2a in three ocular arrangements. 
The monocular condition was an exact replication of the condition from experiment 2a, 
but for three different observers (RP, DHB, and SB). The results were very similar to before, 
with significant adaptation after-effects of around 5.2 dB and 2.2 dB for the co-oriented 
and cross-oriented targets, respectively (figure 4, blue). When the adapter and target were 
presented binocularly (figure 4c, red), the adaptation after-effect was weaker than in the 
monocular case for each of the adapter orientations. For the co-oriented target the effect 
dropped marginally (to 4.4 dB), and for the cross-oriented target the effect was abolished 
(there was no significant effect). When the adapter and target were presented to different eyes 
(dichoptic condition), there was a small negative adaptation after-effect of around 1 dB (ie, 
detection performance was slightly improved by adaptation) . This was statistically significant 
[p< 0.05) for both orientations (figure 4c, green). 

3.3.1 Interpretation of adaptation within and between the eyes. If the nonoriented compo- 
nent of adaptation identified in experiment 2 arose after binocular combination, then it 
should have been evident in the dichoptic condition. The fact that it was not suggests that 
the effect arises within purely monocular mechanisms. Similar conclusions follow from 
preliminary reports of a related adaptation study by Cass (2010). 

The finding of facilitation in the dichoptic condition (and its inconsistency across 
observers) replicates an earlier finding by Baker et al (2007) and preliminary reports by 
others (Bex et al 2007; Cass 2010). As Baker et al (2007) pointed out, this might derive from 
the release of standing inhibition between cross-oriented mechanisms in different eyes. That 
is, the cross-oriented adaptation causes disinhibition across orientation and eye. Similarly, 
the release of standing inhibition between co-oriented mechanisms in different eyes would 
explain the co-oriented facilitation. (3) This type of inhibitory interaction between the eyes 
and its release by adaptation in the fellow eye might also explain why the after-effects were 
smaller in the binocular condition than in the monocular condition (figure 4c). 

3.4 Experiment 3a: Monocular cross-orientation summation 

Summation ratios (SR) were calculated in the conventional way as follows: SR = 
201og 10 (Qj?/Ci + j?), where Clr is the average detection threshold for the individual 
components and Cl+r is the detection threshold for the compound stimulus (the sum 
of the two components). With this arrangement, SR = 6 dB (a factor of 2) would indicate 
perfect linear summation, and SR = 0 dB (a factor of 1) would indicate no summation (ie, no 
benefit of adding a second component to the first). Summation ratios are plotted in figure 5b 
averaged across three observers (AJ, KB, and SS) . Only modest levels of summation were 
found at each spatio-temporal frequency, but were greater in the fast condition (0.5 c/deg., 
15 Hz flicker) than in the slow condition (2 c/deg., 2 Hz flicker) [p = 0.016). 

3.4.1 Types of summation. Probability summation [sometimes called 'signal selection'; Meese 
and Baker (201 1)] is usually taken to be the minimal combination rule (Tyler and Chen 2000). 
When this occurs between two sets of independent noisy linear mechanisms, the SR is 
about 1.5 dB (Tyler and Chen 2000). Nonlinearities prior to probability summation would 
reduce this value further. For example, a square-law transducer produces SR « 0.75 dB in 
conjunction with probability summation. A distinct alternative to probability summation is 
signal combination, where signals are summed within a single mechanism. Most analyses 

(3) Only one of the three observers here (SB) shows good evidence of the co-oriented dichoptic 
facilitatory effect. Preliminary analysis of work to be published elsewhere found only weak evidence 
for this in 3 of the 5 observers (up to 1.4 dB of facilitation). Clearly, even if real, the facilitatory effect is 
typically small and inconsistent across observers. We consider this effect to be a minor observation 
and one that is peripheral to the main conclusions and motivations in this paper. 
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find that this is a more potent form of summation than probability summation (eg, see 
Meese 2010; Meese and Summers 2009; Meese and Baker 2011). However, predictions are 
complicated by the potential effects of spatial pooling (eg, Georgeson and Shackleton 1994; 
Bergen et al 1979) and integration of early noise (Meese 2010), each of which can dilute the 
predicted level of summation across orientation. For example, for our sine-phase Gabor 
patches linear summation across orientation followed by peak-picking (a MAX operator) 
across space and late additive noise predicts SR = 5.6 dB (nearly a factor of 2). But it is easy to 
show that this drops to 3.9 dB, 3.0 dB (a factor of y/2), and 2.1 dB for Minkowski summation 
over space using Minkowski exponents of 4 (fourth-root), 2 (quadratic; Meese 2010), and 1 
(linear), respectively. (4) 

To establish evidence for signal combination across orthogonal orientations, we required 
that summation were significantly greater than the probability summation level of SR = 
1.5 dB. This criterion was met in the fast condition, with empirical summation levels just 
more than 2 dB. For the slow condition summation was actually a little less than 1.5 dB (see 
figure 5b) . 

3.4.2 Interpretation of summation results. These results (figure 5b) suggest there are mecha- 
nisms in the high-speed corner of spatio-temporal vision that can perform signal combina- 
tion of luminance contrast across orthogonal orientations and that the observer can use these 
mechanisms in contrast detection tasks. Isotropic cortical mechanisms (cells with circular 
receptive fields) are plausible candidates for this because they sum luminance modulations 
at all orientations. 

3.5 Experiment 3b: Multiple ocular summation conditions 

We also extended the cross- orientation summation experiment to multiple ocular conditions 
using the fast stimuli (0.5 c/deg., 15 Hz flicker). The results are shown in figure 5c. The 
monocular condition produced 3.1 dB of summation — a little more than the equivalent 
condition from the different observers in experiment 3a — and as before, was significantly 
greater than the probability summation prediction of 1.5 dB. (The average level of monocular 
summation across experiments 3a and 3b was 2.6 dB; n = 6 observers). When the stimuli 



(4) These standard computations are fairly simple, but it is not easy to provide intuitions for the results, 
and the equations involved do little to help (not shown) . There are one or two points that we can make, 
though. First, the usual intuition for linear summation between two equally detectable components 
(for which the response to each is one arbitrary unit) with late additive noise (ie, limiting noise that 
arises at the final stage, just before the decision variable) is that the summation ratio is 2 (6 dB), 
because 1 + 1 = 2. For sine-wave gratings this intuition holds if a MAX operator is used over space so 
that the computation is not affected by spatial pooling. For the sine-phase Gabor stimuli used here 
the contrast modulation at right angles to the orientation of the bars means that the spatial peak 
in the compound stimulus is the sum of two local contrasts, each of which is slightly less than the 
Michelson contrast of each component. Therefore, the summation ratio is slightly less than 2 (here, 
5.6 dB), assuming the spatial MAX operator again. Second, it is well-known (eg, Bergen et al 1979) 
that when pairs of differently oriented ID luminance modulations (eg, gratings or Gabor patches) 
are summed, the constructive and destructive interference across two-dimensional space (the beats) 
means that when spatial pooling (ie, the combination of responses over space) is more potent than 
a MAX operator, there is a reduction in the benefit of the second component. As the spatial pooling 
exponent (ie, Minkowski exponent) is increased (ie, spatial pooling becomes more nonlinear) this 
effect decreases. For example, a large Minkowski exponent (eg, > 10) represents highly nonlinear 
spatial pooling and is a good approximation to the spatial MAX operator (above), which preserves the 
intuitive summation ratio of 2 (6 dB) across grating orientation. [Note that the comments here are for 
the situation where the limiting noise is late. When limiting noise is placed before the MAX operation, 
the effects of pooling are often approximated by a Minkowski exponent of about 4, assuming a linear 
transducer (Tyler and Chen 2000).] 
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Figure 5. Cross-orientation summation experiments (experiments 3a and 3b). Stimulus configurations 
are shown in (a) and results for experiments 3a and 3b in (b) and (c), respectively. The 'fast' stimuli 
had a spatial frequency of 0.5 c/ deg. and temporal frequency of 15 Hz. The 'slow' stimuli had a spatial 
frequency of 2 c/deg. and a temporal frequency of 2 Hz (see figure 2 for details). The horizontal dotted 
lines in the results panels indicate SR = 1.5 dB: the prediction for the canonical model of probability 
summation. Statistical tests were performed against this criterion, and significant results are denoted 
by the asterisks at the top of the figures. 



were presented either binocularly or dichoptically, summation was weaker (~ 1.8 dB) and 
not significantly greater than 1.5 dB. 

3.5.1 Interpretation of summation within and between the eyes. If the nonoriented detecting 
mechanism identified in experiment 3 were binocular, then the levels of dichoptic and 
monocular cross-orientation summation should have been the same because the binocular 
response would be the same for each of these conditions. The fact that the SR was less in the 
dichoptic condition suggests that the isotropic detecting mechanisms are purely monocular. 
But why should there be less summation for binocular stimuli than monocular stimuli? In 
fact, the scheme we are to propose does predict that this should happen, but we withhold 
explanation until after describing the model details in the general discussion. 



4 General discussion 




We performed masking, adaptation, and summation experiments for the contrast detec- 
tion of achromatic flickering Gabor patches. We found interactions across orientation in all 
three paradigms. For adaptation and summation the effects were greater for the fast condition 
(0.5 c/deg., 15 Hz flicker; 30 deg./s) than the slow condition (2 c/deg., 2 Hz flicker; 1 deg./s), 
confirming previous reports of a departure from scale invariance (Kelly and Burbeck 1987). 
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Elsewhere this departure has also been reported for cross-orientation masking (Burbeck and 
Kelly 1981; Meese and Holmes 2007, 2010). 

The summation results provide good evidence for nonoriented detecting mechanisms in 
the high-speed corner of spatio-temporal vision and reject the 'textbook' scheme involving 
only oriented detectors (figure la). However, although the results of our masking experiment 
confirm the existence of cross-orientation interactions, we doubt these derive from within- 
channel activity in isotropic detecting mechanisms because of the steep psychometric 
function and the lack of facilitation (see results section). It is more likely that the target 
was detected by an oriented mechanism that was suppressed by the cross-oriented mask 
than that the high-speed corner of spatio-temporal vision was dominated by nonoriented 
mechanisms (figure lb). This leaves the hypothesis in figure lc: there are both nonoriented 
and oriented mechanisms in the high-speed corner of spatio-temporal vision. 

For the adaptation and summation experiments we also found greater cross-orientation 
effects for monocular conditions than dichoptic conditions. The summation result is new and 
suggests that the nonoriented detection mechanisms that we have investigated are purely 
monocular. We have denoted this with the dark blue colouring in figure lc. The adaptation 
result confirms some previous observations (Kelly and Burbeck 1987; Meese et al 2007; Bex 
et al 2007; Cass 2010) and is also consistent with the conclusion above. 

In the study here we were able to calculate a binocular summation ratio (not shown) 
for co-oriented components from the monocular and binocular thresholds measured in 
experiments 2b and 3b. On average, this was 3.9 dB for the fast target [n = 3 observers). This is 
similar to previous studies (eg, Meese et al 2006) and confirms the existence of conventional 
(co-oriented) binocular summation in the study here. Furthermore, in a preliminary report 
by Georgeson and Meese (2007) binocular summation ratios were derived for a wide range 
of spatio-temporal frequencies. Signal combination was found in all cases providing firm 
evidence for binocular mechanisms throughout spatio-temporal vision (red symbols in 
figure lc). 

We present a summary of the full set of our current results in table 1. This is in anticipation 
of the qualitative assessment of our model in the following sections, where we cross reference 
the effect numbers in this table. 

4.1 A specific proposal 

How might early vision be wired to be consistent with our results? Figure 6 shows a simple 
neural hierarchy designed to achieve this, where figure 6a is for the high-speed corner 
of spatio-temporal vision, assessed by our 'fast' stimulus condition, and figure 6b is for 
everywhere else, assessed by our 'slow' stimulus condition (we discuss this generalization 
below). In figure 6a a pair of isotropic monocular filters at level 1 feed into an oriented 
binocular filter at level 2. The green dashed boxes indicate that all three mechanisms are 
susceptible to adaptation. For simplicity, we have shown binocular summation, orientation 
filtering, and adaptation at a single stage in level 2, though several sub-stages might be 
involved. The addressable outputs of all three filters pass through sigmoidal transducers 
(Legge and Foley 1980) before reaching the decision stage. [We note that detailed models of 
more elaborate datasets might involve multiple stages of contrast transduction (Meese and 
Summers 2009; Meese and Baker 2011)]. In principle, this contrast transduction could derive 
from accelerating square-law energy mechanisms (Duong and Freeman 2008; Meese 2010) 
and dynamic contrast gain control (Foley 1994). Figure 6b is identical to figure 6a except that 
(i) there are no monocular outputs from the isotropic mechanisms and (ii) the desensitizing 
effect of adaptation is limited to the orientation-tuned mechanism (green dashed box) . 
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Table 1. Summary of the 18 effects (including 5 null effects) reported in this study. Two other 
relevant effects (3 and 4) are reported from other studies (eg, Burbeck and Kelly 1981; Meese and 
Holmes 2007). The coloured text in the final column indicates the relative success of the model in 
figure 6: green = good; blue = simple refinements needed (these are omitted from figure 6 to simplify 
the presentation); red= further work needed. 'Yes, with some elaboration' means that the model's 
behaviour requires some explanation beyond that evident in the pictorial presentation. Two-tailed 
significance testing of the experimental results was done using a bootstrapping technique. * Although 
this result appears significant, the result is in the opposite direction from the effect that is described. 
See effect 15 and the green checked bar in figure 4c. 



Effect 


Paradigm 


Effect 


Data figure 


Significance 


Accounted for by the model in 


number 








level 


figure 6 


1 


Masking 


Pedestal masking 


Figure 3 c 


< 0.001 


Yes 


2 




Cross-orientation monocular masking 


Figure 3 c 


< 0.001 


Yes, with some elaboration 


3 




Cross-orientation dichoptic masking 


Not in this study 


N/A 


No, but easily fixed 


4 




More cross -orientation masking at higher stimulus 
speeds 


Not in this study 


N/A 


No, but easily fixed 


5 




Facilitation for low contrast pedestals 


Figure 3 c 


< 0.001 


Yes 


6 




No facilitation for low contrast cross -oriented masks 


Figure 3 c 


0.986 


Yes 


7 




Shallow psychometric function for pedestal masking 


Figure 3d 


0.249 


Yes 


8 




Steep psychometric function for baseline 


Figure 3d 


< 0.001 


Yes 


9 




Steep psychometric function for cross -oriented masking 


Figure 3d 


< 0.001 


Yes, with some elaboration 


10 


Adaptation 


Threshold elevation for monocular cross-orientation 
adaptation 


Figure 4b, c 


< 0.001 

< 0.001 


Yes 


11 




No threshold elevation for dichoptic cross- orientation 
adaptation 


Figure 4c 


0.014* 


Yes 


12 




No threshold elevation for binocular cross -orientation 
adaptation 


Figure 4c 


0.474 


No. Not easily assessed qualitativel 


13 




More monocular threshold elevation for co-oriented 
adapters than cross-oriented adapters 


Figure 4b, c 


< 0.001 

< 0.001 


Yes, with some elaboration 


14 




More threshold elevation at higher stimulus speeds 


Figure 4b 


0.004 




15 




Facilitation for dichoptic adaptation (co-oriented and 
cross- oriented) 


Figure 4c 


0.036 
0.014 


No, but easily fixed 


16 


Summation 


Monocular signal combination 


Figure 5b, c 


0.040 
< 0.001 


Yes 


17 




No dichoptic signal combination 


Figure 5 c 


0.527 


Yes 


18 




No binocular signal combination 


Figure 5 c 


0.182 


Yes 


19 




Signal combination only at high speeds 


Figure 5 b 


0.016 


Yes 


20 




Binocular summation of co-oriented components 


Not shown 


< 0.001 


Yes 



4.2 A successful model 

The model in figure 6a has six elements (all from the notional catalogue of standard 
psychophysical model parts), which reduces to three by symmetry (adaptable isotropic 
monocular filters, adaptable binocular oriented filter, output nonlinearities). Four changes 
to figure 6a are needed to produce figure 6b, which reduces to two by symmetry (remove 
monocular outputs, remove monocular adaptability) . Arguably then, a total of five design 
decisions were required to construct the model. Without elaboration or extension, this 
simple model is consistent with 13 of the 18 effects reported here (as we shall soon describe). 
A further four of these effects (effects 2, 9, 13, and 15) are easily explained with simple 
elaborations and refinements, as are another two relevant effects from other studies (3 and 4). 
Only one (effect 12) is not easily explained within the framework we propose and probably 
requires detailed quantitative modelling to be properly understood. This is beyond the scope 
of this study and the constraints of our data. In what follows we provide a brief overview of 
the qualitative relation between our model and all 20 effects in table 1 (18 from this study, 
including 5 null effects, and 2 from elsewhere), starting with adaptation, then summation, 
and finally masking. 

4.2.1 Adaptation in the model and in humans. Threshold elevation occurs for monocular 
cross-oriented adapters (effect 10) owing to the adaptable isotropic filters at level 1 in figure 6a 
(Ohzawa et al 1985; Solomon et al 2004; Mante et al 2005; Duong and Freeman 2007). It does 
not happen in the dichoptic case (effect 11) because there is no adaptable node (green 
dashed box) that is common to orthogonal orientations in different eyes. However, why 
should monocular adaptation be orientation tuned (effect 13)? (That is, why is threshold 
elevation greater for co-oriented adapters than for cross-oriented adapters?) As the only 
adaptable monocular route is isotropic, the implication is that the oriented binocular filter is 
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Figure 6. Proposed circuit diagrams for the mechanisms of achromatic spatio-temporal vision, (a) 
Arrangement for isotropic monocular mechanisms and oriented binocular mechanisms in the high- 
speed corner of figure lc. (b) Arrangement for the oriented binocular mechanisms outside the 
high-speed corner of figure lc. The green dashed squares indicate the various loci of adaptation. 
The assignment of levels 1 and 2 to subcortex and cortex is based on known physiology and is 
not constrained by the results here. The implication is that the outputs of adaptable isotropic 
filters are available for decision-making in the high-speed corner of spatiotemporal vision, but not 
elsewhere. (The various pathways that might be involved in cross-channel suppression and interocular 
suppression are omitted for clarity but have been shown in previous publications: Meese et al 2006; 
Baker et al 2007, Baker and Meese 2007; Meese and Baker 2009, 2011). 



used for the detection. But why should this happen, given that this route takes a double hit 
from adaptation? We suggest that the orientation tuning derives, in part, from linear spatial 
summation across an array of isotropic filter elements (Hubel and Wiesel 1962; Mooser et al 
2004). Thus, the oriented filter elements are more sensitive than their nonoriented subunits 
because of the benefit of spatial summation within their elongated receptive fields (Meese 
2010). Instead, we might have pursued a more complex argument (and model) involving 
oriented monocular filters. Indeed, we cannot rule out the possibility that such mechanisms 
exist, but we were not compelled to appeal to them here. But we do need the monocular 
isotropic outputs in figure 6a to account for the summation results, as we describe in the 
next section. 

Threshold elevation is greater at higher stimulus speeds (effect 14) because in this corner 
of spatio-temporal vision there are two stages of adaptation (figure 6a), compared with the 
single stage elsewhere (figure 6b). 

As explained above, there is no threshold elevation for cross-oriented dichoptic masking 
(effect 11), but there is a facilitatory after-effect, and this is found for both orientations of 
the dichoptic adapter (effect 15). This is not explained by the model in figure 6, but it can 
be achieved by the simple extension described in the results: if the model were to include 
suppressive interactions across eyes and orientation (Sengpiel et al 1995; Walker et al 1998; 
Li et al 2005; Sengpiel and Vorobyov 2005; Ding and Sperling 2006; Meese et al 2006; Baker et 
al 2007; Moradi and Heeger 2009), then when adapted, these could result in disinhibition 
and facilitation (see Meese et al 2007). 

In figure 6a we would expect binocular cross-orientation adaptation to be just as potent as 
its monocular counterpart. The fact that it was not (effect 12) points to a potential weakness 
in the model. It is possible that this might have something to do with interocular suppression 
between the binocular adapters, which is not explicit in the simple scheme sketched out in 
figure 6. For example, we have found that when a mask in one eye is matched to a mask in the 
other eye, this can reduce the potency of the original mask. We say that the binocular match 
of the masks gates the level of masking (Meese and Hess 2005; Baker et al 2007). Similar 
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effects have been found for binocular rivalry (Nichols and Wilson 2009). We now see that a 
binocular match of a cross-oriented adapter also reduces the potency of cross-orientation 
adaptation (effects 10 and 12). Another possibility is that there is disinhibition from the 
interocular component of the binocular adapter (see our explanation of effect 15 above), 
which cancels out the adaptation effect of desensitization. However, the variability in our 
data (across observers) dissuaded us from attempting to develop a detailed quantitative 
model of this effect. 

4.2.2 Summation in the model and in humans. Cross-oriented signal combination is seen 
for pairs of monocular components (effect 16) because contrast is linearly summed across 
orientation by the isotropic filters at level 1 (figure 6a) and this output is available to the 
decision stage. We have argued above that the oriented filter elements are more sensitive 
than the isotropic filter elements, but presumably, the extra stimulus component that is 
visible to the isotropic filter overcomes this setback. This does not happen for binocular 
stimuli, however (effect 18), because in that case the more sensitive oriented binocular filter 
also benefits from summing a pair of parallel components from the two eyes. Summing a 
pair of monocular components across orientation within the less sensitive isotropic filters 
cannot beat this binocular benefit. 

There is no cross-orientation signal combination for dichoptic presentation (effect 17) 
because the two components do not stimulate a common isotropic filter. 

Cross-orientation signal combination occurs only at high speeds (effect 19) because there 
are no outputs from isotropic filters at lower speeds (figure 6b) . 

Finally, there is binocular summation of co-oriented components (effect 20) because the 
model contains oriented binocular mechanisms at all spatio-temporal frequencies. 

4.2.3 Masking in the model and in humans. Pedestal masking (effect 1) in the model derives 
from the compressive region of the sigmoidal contrast nonlinearity at high contrasts (Legge 
and Foley 1980). 

Monocular cross-orientation masking (effect 2) could arise from either of two routes. 
First, it could arise from cross-orientation inhibition between oriented mechanisms in the 
cortex (Albrecht and Giesler 1991; Heeger 1992; Foley 1994; Haun and Essock 2010; Spratling 
2011), an approach that could also accommodate cross-oriented dichoptic masking (effect 
3) (Ding and Sperling 2006; Baker and Meese 2007; Baker et al 2007; Meese and Baker 2009). 
This is not shown in figure 6, but the scheme is easily extended to do this, and we have made 
explicit proposals elsewhere (Baker et al 2007; Baker and Meese 2007). The simplest method 
here would be to replace the sigmoidal output nonlinearities with dynamic contrast gain 
control with a broadly tuned suppression field (Foley 1994; Meese and Holmes 2010). This 
would involve inhibitory interactions amongst differently oriented mechanisms at level 2 
in figure 6. (5) A second explanation of effect 2 is that masking arises within the isotropic 
filters at level 1 (Klein et al 1997; Meese and Holmes 2010). If the output characteristic of 
these cells were such that the cross-oriented mask reduced the effective contrast available 
to the subsequent stage of orientation filtering, then this would produce suppression while 
retaining the characteristics of the output nonlinearity at the decision stage. One way that 
this could be achieved would be from an isotropic suppression field, such as that identified 
in the retina (Shapley and Victor 1978) and LGN (Bonin et al 2005) of cats [see Meese and 
Holmes (2010) for further discussion, but also see Spratling (2011)]. 

The fact that cross-orientation masking is more potent at high stimulus speeds (effect 4) 
is not an emergent property of our model here, but it could be easily accommodated by the 

(5) The details are more complicated than this. Elsewhere (Baker et al 2007; Meese and Baker 2009) 
we have argued that the suppressive interactions arise after orientation tuning but before binocular 
summation. This implies an additional stage between stages 1 and 2 in figure 6. 
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appropriate setting of suppression weights in the contrast gain control at either level 1 or 
level 2 (Meese and Holmes 2007; Meese and Baker 2009). 

The pedestal facilitation (effect 5) is explained in the conventional way (Legge and Foley 
1980) — by arranging that the output nonlinearity accelerates at low contrasts, similar to the 
square-law process needed for energy (or power) detection (Meese 2010). This also produces 
a steep psychometric function at detection threshold (effect 8) (Foley and Legge 1981). Since 
none of our proposals for cross-orientation masking attribute the effect to within-channel 
excitation of isotropic detecting mechanisms (ie, we do not require that the observer is able 
to access the outputs of the isotropic filters in the masking experiment), the cross-oriented 
mask does not drive the sigmoidal output nonlinearity at the decision stage. Therefore, there 
is no facilitation (effect 6), and the psychometric function is steep for cross-orientation 
masking (effect 9) (Holmes and Meese 2004). However, it is shallow for pedestal masking 
(effect 7) because of the linearizing effect for small signal increments caused by the drive up 
the transducer (Foley and Legge 1981; Meese et al 2006). 

4.2.4 Model summary and interpretation. For the purposes of this paper the important 
features of our model are illustrated in figures lc and 6. As with previous proposals (Hubel and 
Wiesel 1962; Mooser et al 2004), oriented filters are fed by (arrays of) isotropic filters. However, 
the isotropic filters are also available for decision making, but only within monocular 
channels selective for low spatial frequencies and high temporal frequencies. Isotropic filters 
are also prone to adaptation in this corner of spatio-temporal vision. Taken together with 
results from neurophysiology (Solomon et al 2004), our model of achromatic vision might 
be interpreted as follows: the observer can tap into an image-representation at an earlier 
stage in the (transient) magnoceilular stream than in the (sustained) parvocellular stream 
(the situation for chromatic vision is probably different, as we discuss below). However, the 
purpose of this monocular privilege remains unclear. 

4.3 Comparisons with Kelly and Burbeck (1987) 

The results of the adaptation and summation experiments here (experiments 2b and 3b) 
suggest that Kelly and Burbeck (1987) were wise (or fortuitous) to conduct their experiments 
monocularly. Indeed, for the experimental conditions used here this type of presentation 
is essential for revealing each type of cross- orientation interaction. This also follows from 
preliminary observations of adaptation recently reported by Cass (2010). 

4.3.1 Monocular summation. In the high-speed corner of spatio-temporal vision the average 
level of cross-orientation summation found by us (2.6 dB) was only a little less than that 
found by Kelly and Burbeck (1987) (~ 3 dB). We note that Kelly and Burbeck reported results 
for only a single observer and that one of our six observers (RP) produced summation levels 
that were in excess of this (figure 5c). We have presented quantitative arguments (see the 
results section for experiment 3a) to suggest we might expect only modest effects of cross- 
orientation summation from isotropic mechanisms (2 to 3 dB) if spatial summation were a 
significant factor (see also Bergen et al 1979), consistent with our average results. 

4.3.2 Monocular adaptation. A comparison between our monocular cross-adaptation results 
and those of Kelly and Burbeck reveals notable differences. We have replicated their finding 
that thresholds are elevated in the high-speed corner of spatio-temporal vision by adapting 
to an orientation at right angles to the target, but our effects (typically a little less than 3 dB) 
are notably smaller than theirs (typically a little greater than 6 dB). Why might this be? We 
wondered whether we might get larger effects at faster speeds, so DHB performed cross- 
orientation adaptation for 0.25 c/deg. adapter and target stimuli at 15 Hz (60 deg./s). This 
increased the after-effect of adaptation a little for this observer (from 2.2 dB at 0.5 c/deg. 
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to 4.6 dB at 0.25 c/deg.), confirming that 'speed' is important (Kelly and Burbeck 1987) 
but still not achieving the higher levels found by Kelly and Burbeck. Furthermore, our main 
spatio-temporal conditions were comparable to some of those used by Kelly and Burbeck, 
so the question of why their effects were larger remains. Kelly and Burbeck used a slightly 
higher contrast adapter than us (95% versus 80%), a higher mean luminance (90cd/m 2 
versus 16cd/m 2 ), and drifting adapters rather than our jittering adapters. Another factor 
is that the method used for monocular stimulation might be important. We used shutter 
goggles to present uniform mean luminance to the irrelevant eye. Unfortunately, Kelly and 
Burbeck (1987) do not report how they achieved monocularity, though an occluder is a 
likely possibility. It is also possible that the difference owes to the criterion sensitive (yes/no) 
methods used by Kelly and Burbeck (1987) compared with the criterion free 2IFC method 
used here. 

Another factor relates to the method used for measuring the baselines for contrast 
detection thresholds. Previous studies have adopted two different approaches. In one (eg, 
Baker et al 2007) the baseline blocks of trials are identical to the experimental blocks, except 
that the adapter contrast is set to 0% (ie, observers adapt to a blank screen) . This has the 
benefit of using consistent experimental protocols across the measures being compared 
but means that the baseline involves adaptation to mean luminance. This is a questionable 
comparison, since the visual system is normally 'adapted' to the natural images encountered 
in the real world (Elliot et al 2011). In the other approach (eg, Langley and Bex 2007) the 
adaptation periods (both pre and top-up) are removed from the experimental sequence. 
This avoids the artificial process of blank adaptation but has the disadvantage that the 
experimental sequence is not identical for the baseline and experimental measures that are 
to be compared. We used the second of these methods here, but it is not clear which method 
Kelly and Burbeck (1987) used. To check whether the method might be important, we ran a 
control experiment to compare baselines measured each way. For DHB and SB and a fast 
stimulus (0.5 c/deg., 15 Hz flicker) the baselines were 1.9 dB and 2.2 dB lower, respectively, 
when blank adaptation was used (typical standard error was 0.27 dB and 0.85 dB for DHB 
and SB, respectively). For the slow stimulus (2 c/deg, 2 Hz flicker) the effects were smaller 
(average difference of 0.5 dB) and in opposite directions for the two observers. Thus, a case 
could be made that we have underestimated the magnitude of the adaptation after-effects 
in our fast conditions by about 2 dB. Fortunately, this does not change any of our main 
conclusions or the model architecture that we propose in figure 6. 

Whether any one or some combination of the factors above is responsible for the different 
levels of cross-orientation adaptation reported across studies remains unclear. 

4.4 Further considerations 

4.4.1 Cascading stages of adaptation and spatial selectivity. One feature of our model is the 
cascading stages of adaptation (levels 1 and 2 in figure 6a). We note that similar cascades 
have been suggested before (Georgeson and Schofield 2002; Shady et al 2004; Langley and 
Bex 2007). For example, Georgeson and Schofield (2002) used the tilt after-effect to reveal a 
cascade of orientation tuned stages (beyond the scope of the present study), whereas Langley 
and Bex (2007) identified an initial adaptable transient stage followed by a parallel pair of 
adaptable temporal channels. Whether our stage 1 filter is related to the initial temporal 
filter of Langley and Bex (2007) is not clear and would require a better understanding of the 
spatial selectivity of the filters in each case. For example, we have represented our initial 
stage (level 1) with a familiar icon for a spatially opponent receptive field (figure 6). However, 



1 'Facilitation from a dichoptic adapter (co-oriented and cross-oriented) was also confirmed for the 
0.25 c/deg. adapter and target. 



178 



T S Meese, D H Baker 



there is nothing in our data to demand that our level 1 filter is pattern selective (as we discuss 
further in the next section). It could be that its primary function is that of temporal selectivity. 

4.4.2 Generalization of spatio-temporal conditions. The study here has concentrated on the 
high-speed corner of achromatic spatio-temporal vision. Burbeck and Kelly (1981), Meese 
and Holmes (2007), and Meese and Baker (2009) surveyed the entire spatio-temporal space 
using a cross-orientation masking paradigm for binocular, monocular, and dichoptic masks 
and targets. No similarly extensive study has been performed for adaptation or summation 
using criterion free methods, but the yes/no adaptation study of Kelly and Burbeck (1987) 
covered a substantial spatio-temporal region. In sum, it is possible that our results and 
conclusions from the 2 c/deg., 2 Hz Gabor stimuli do not generalize to the entire achromatic 
space outside the high-speed corner. Nonetheless, we are aware of no results to suggest that 
they might not, and the Kelly and Burbeck study suggests that they do, at least for adaptation. 

4.4.3 Chromaticity versus achromaticity. In stark contrast to above, we do not generalize our 
conclusions (figure la and figure 6) to chromatic spatio-temporal space. In a preliminary 
report Mullen et al (2010) used the summation paradigm to reveal nonoriented chromatic 
(red/green) mechanisms — but not achromatic mechanisms — at low spatial frequency and 
low temporal frequency. Like their achromatic counterparts here, these mechanisms ap- 
peared to be strictly monocular. Intriguingly, the nonoriented monocularity is consistent 
with single-cell recordings in layer 4Ca and layer 4C/3 of primary visual cortex (Hubel 
and Wiesel 1968, 1977; Blasdel and Fitzpatrick 1984) for the achromatic and chromatic 
studies, respectively. The implication is that these cells not only provide convergent inputs to 
orientation-tuned filters (Mooser et al 2004), but their outputs (or similar) are also available 
to the observer for decision making. 

4.4.4 Our conclusions derive from the simplest model arrangement. Our model in figure 6 
is the simplest one we could devise to account for the majority of our data. As always, 
more complex models might be derived that can also account for the results, and these 
might change our interpretations and conclusions. For example, our claim that observers 
are able to access the outputs of a monocular stage of processing (for the high-speed 
stimuli) follows from the diminution of cross-orientation summation when the components 
were presented to different eyes (experiment 3b). However, another possibility is that this 
arises from interocular suppression across orientation between the targets in the different 
eyes. Indeed, we have proposed such pathways to account for the minor adaptation after- 
effects of dichoptic cross-orientation facilitation that we observed, attributing that effect to 
disinhibition. However, the problem here is to devise a model in which cross-orientation 
suppression could arise from components that are themselves only at threshold. Indeed, 
studies where the relevant masking measures have been made (eg, Baker et al 2007; Meese 
and Baker 2009) suggest that suppression is not found until the mask component is above its 
own detection threshold. 

4.4.5 Further considerations. Notwithstanding our comments above, our conclusions regard- 
ing perceptual access to an adaptable isotropic mechanism might require some elaboration. 
Using uniform flicker (ie, no spatial pattern) as adapt and test patterns, Shady et al (2004) 
report that flicker frequencies that are too high to be seen can serve as effective adapters. 
Meier and Carandini (2002) found a similar result for cross-orientation masking. In other 
words, each of these studies points to what is presumably an isotropic mechanism that can 
influence visual perception but for which the observer cannot access the output. These 
results can be reconciled with our account if a low pass temporal frequency filter (Langley 
and Bex 2007) were inserted between our level 1 filter and the output to the decision variable. 
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As it seems likely that this is subcortical [eg, see Meier and Carandini (2002) for a brief review] , 
this temporal filtering is presumably positioned before binocular convergence in our scheme. 

5 Summary 

Our criterion-free psychophysical experiments confirm earlier findings of Kelly and 
Burbeck (Burbeck and Kelly 1981; Kelly and Burbeck 1987): cross-orientation interactions 
can be found in the high-speed corner of spatio-temporal vision but are diminished or absent 
elsewhere. However, Kelly and Burbeck attributed the results from all three paradigms (mask- 
ing, adaptation, and summation) to a common explanation: that the detecting mechanisms 
are isotropic. In the light of a greater number of experimental conditions and analyses, we 
interpret these results rather differently, arguing that only the summation paradigm reveals 
nonoriented detecting mechanisms directly. Adaptation possibly reveals adaptable isotropic 
subunits of oriented filters, and masking reveals cross-orientation interactions from either 
oriented or isotropic mechanisms onto oriented detectors. We also conclude that the high- 
speed corner of spatio-temporal vision contains both oriented and nonoriented filters, but 
that the nonoriented variety is strictly monocular (figure lc). We have argued that 16 of the 18 
psychophysical effects reported here are consistent with a simple model of these processes 
involving only five design decisions and some simple elaborations. One further effect and 
two other relevant results in the literature are easily accommodated with simple extensions 
to our model. A notable property of our model (figure 6) is that if a filter is available for 
decision making, it is also adaptable. 
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